function return
Chain of Execution Supervision Promotes General Reasoning in Large Language Models
Chen, Nuo, Li, Zehua, Bao, Keqin, Lin, Junyang, Liu, Dayiheng
Building robust and general reasoning ability is a central goal in the development of large language models (LLMs). Recent efforts increasingly turn to code as a rich training source, given its inherent logical structure and diverse reasoning paradigms such as divide-and-conquer, topological ordering, and enumeration. However, reasoning in code is often expressed implicitly and entangled with syntactic or implementation noise, making direct training on raw code suboptimal.To address this, we introduce TracePile, a large-scale corpus of 2.6 million samples that transforms code execution into explicit, step-by-step chain-of-thought-style rationales, which we call Chain of Execution (CoE). The corpus spans domains including mathematics, classical algorithms and algorithmic competition, and is enriched with variable-tracing questions and code rewritings to enhance logical granularity and code diversity. We evaluate TracePile using three training setups: continue-pretraining, instruction tuning after pretraining, and two-stage finetuning. Experiments across four base models (LLaMA 3, LLaMA 3.1, Qwen-2.5, and Qwen-2.5 Coder) and 20 benchmarks covering math, code, logic, and algorithms demonstrate consistent improvements. Notably, TracePile boosts LLaMA3.1-8B by 7.1\% on average across nine math datasets and delivers clear gains on LiveCodeBench, CRUX, and MMLU under two-stage fine-tuning.
- North America > United States > Washington > King County > Seattle (0.04)
- Asia > China > Hong Kong (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.92)
CausalMACE: Causality Empowered Multi-Agents in Minecraft Cooperative Tasks
Chai, Qi, Zheng, Zhang, Ren, Junlong, Ye, Deheng, Lin, Zichuan, Wang, Hao
Minecraft, as an open-world virtual interactive environment, has become a prominent platform for research on agent decision-making and execution. Existing works primarily adopt a single Large Language Model (LLM) agent to complete various in-game tasks. However, for complex tasks requiring lengthy sequences of actions, single-agent approaches often face challenges related to inefficiency and limited fault tolerance. Despite these issues, research on multi-agent collaboration remains scarce. In this paper, we propose CausalMACE, a holistic causality planning framework designed to enhance multi-agent systems, in which we incorporate causality to manage dependencies among subtasks. Technically, our proposed framework introduces two modules: an overarching task graph for global task planning and a causality-based module for dependency management, where inherent rules are adopted to perform causal intervention. Experimental results demonstrate our approach achieves state-of-the-art performance in multi-agent cooperative tasks of Minecraft.
- Leisure & Entertainment > Games > Computer Games (0.93)
- Materials > Metals & Mining (0.68)
HoarePrompt: Structural Reasoning About Program Correctness in Natural Language
Bouras, Dimitrios Stamatios, Dai, Yihan, Wang, Tairan, Xiong, Yingfei, Mechtaev, Sergey
While software requirements are often expressed in natural language, verifying the correctness of a program against natural language requirements is a hard and underexplored problem. Large language models (LLMs) are promising candidates for addressing this challenge, however our experience shows that they are ineffective in this task, often failing to detect even straightforward bugs. To address this gap, we introduce HoarePrompt, a novel approach that adapts fundamental ideas from program analysis and verification to natural language artifacts. Drawing inspiration from the strongest postcondition calculus, HoarePrompt employs a systematic, step-by-step process in which an LLM generates natural language descriptions of reachable program states at various points in the code. To manage loops, we propose few-shot-driven k-induction, an adaptation of the k-induction method widely used in model checking. Once program states are described, HoarePrompt leverages the LLM to assess whether the program, annotated with these state descriptions, conforms to the natural language requirements. For evaluating the quality of classifiers of program correctness with respect to natural language requirements, we constructed CoCoClaNeL, a challenging dataset of solutions to programming competition problems. Our experiments show that HoarePrompt improves the MCC by 62% compared to directly using Zero-shot-CoT prompts for correctness classification. Furthermore, HoarePrompt outperforms a classifier that assesses correctness via LLM-based test generation by increasing the MCC by 93%. The inductive reasoning mechanism contributes a 28% boost to MCC, underscoring its effectiveness in managing loops.
- Asia > China > Beijing > Beijing (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (5 more...)
Towards Making Flowchart Images Machine Interpretable
Shukla, Shreya, Gatti, Prajwal, Kumar, Yogesh, Yadav, Vikash, Mishra, Anand
Computer programming textbooks and software documentations often contain flowcharts to illustrate the flow of an algorithm or procedure. Modern OCR engines often tag these flowcharts as graphics and ignore them in further processing. In this paper, we work towards making flowchart images machine-interpretable by converting them to executable Python codes. To this end, inspired by the recent success in natural language to code generation literature, we present a novel transformer-based framework, namely FloCo-T5. Our model is well-suited for this task,as it can effectively learn semantics, structure, and patterns of programming languages, which it leverages to generate syntactically correct code. We also used a task-specific pre-training objective to pre-train FloCo-T5 using a large number of logic-preserving augmented code samples. Further, to perform a rigorous study of this problem, we introduce theFloCo dataset that contains 11,884 flowchart images and their corresponding Python codes. Our experiments show promising results, and FloCo-T5 clearly outperforms related competitive baselines on code generation metrics. We make our dataset and implementation publicly available.
- Asia > India (0.14)
- North America > United States (0.04)
Pseudocode-Injection Magic: Enabling LLMs to Tackle Graph Computational Tasks
Gong, Chang, Bian, Wanrui, Zhang, Zhijie, Zheng, Weiguo
Graph computational tasks are inherently challenging and often demand the development of advanced algorithms for effective solutions. With the emergence of large language models (LLMs), researchers have begun investigating their potential to address these tasks. However, existing approaches are constrained by LLMs' limited capability to comprehend complex graph structures and their high inference costs, rendering them impractical for handling large-scale graphs. Inspired by human approaches to graph problems, we introduce a novel framework, PIE (Pseudocode-Injection-Enhanced LLM Reasoning for Graph Computational Tasks), which consists of three key steps: problem understanding, prompt design, and code generation. In this framework, LLMs are tasked with understanding the problem and extracting relevant information to generate correct code. The responsibility for analyzing the graph structure and executing the code is delegated to the interpreter. We inject task-related pseudocodes into the prompts to further assist the LLMs in generating efficient code. We also employ cost-effective trial-and-error techniques to ensure that the LLM-generated code executes correctly. Unlike other methods that require invoking LLMs for each individual test case, PIE only calls the LLM during the code generation phase, allowing the generated code to be reused and significantly reducing inference costs. Extensive experiments demonstrate that PIE outperforms existing baselines in terms of both accuracy and computational efficiency.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
Long-Range Tasks Using Short-Context LLMs: Incremental Reasoning With Structured Memories
Jayalath, Dulhan, Wendt, James Bradley, Monath, Nicholas, Tata, Sandeep, Gunel, Beliz
Long-range tasks require reasoning over long inputs. Existing solutions either need large compute budgets, training data, access to model weights, or use complex, task-specific approaches. We present PRISM, which alleviates these concerns by processing information as a stream of chunks, maintaining a structured in-context memory specified by a typed hierarchy schema. This approach demonstrates superior performance to baselines on diverse tasks while using at least 4x smaller contexts than long-context models. Moreover, PRISM is token-efficient. By producing short outputs and efficiently leveraging key-value (KV) caches, it achieves up to 54% cost reduction when compared to alternative short-context approaches. The method also scales down to tiny information chunks (e.g., 500 tokens) without increasing the number of tokens encoded or sacrificing quality. Furthermore, we show that it is possible to generate schemas to generalize our approach to new tasks with minimal effort.
- Europe > Austria > Vienna (0.14)
- Europe > France (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (5 more...)
Towards a Robotic Intrusion Prevention System: Combining Security and Safety in Cognitive Social Robots
Martín, Francisco, Soriano-Salvador, Enrique, Guerrero, José Miguel, Múzquiz, Gorka Guardiola, Manzanares, Juan Carlos, Rodríguez, Francisco J.
Social Robots need to be safe and reliable to share their space with humans. This paper reports on the first results of a research project that aims to create more safe and reliable, intelligent autonomous robots by investigating the implications and interactions between cybersecurity and safety. We propose creating a robotic intrusion prevention system (RIPS) that follows a novel approach to detect and mitigate intrusions in cognitive social robot systems and other cyber-physical systems. The RIPS detects threats at the robotic communication level and enables mitigation of the cyber-physical threats by using System Modes to define what part of the robotic system reduces or limits its functionality while the system is compromised. We demonstrate the validity of our approach by applying it to a cognitive architecture running in a real social robot that preserves the privacy and safety of humans while facing several cyber attack situations.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- Europe > Spain > Galicia > Madrid (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (7 more...)
- Research Report (1.00)
- Overview (1.00)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.55)
DARA: Decomposition-Alignment-Reasoning Autonomous Language Agent for Question Answering over Knowledge Graphs
Fang, Haishuo, Zhu, Xiaodan, Gurevych, Iryna
Answering Questions over Knowledge Graphs (KGQA) is key to well-functioning autonomous language agents in various real-life applications. To improve the neural-symbolic reasoning capabilities of language agents powered by Large Language Models (LLMs) in KGQA, we propose the DecompositionAlignment-Reasoning Agent (DARA) framework. DARA effectively parses questions into formal queries through a dual mechanism: high-level iterative task decomposition and low-level task grounding. Importantly, DARA can be efficiently trained with a small number of high-quality reasoning trajectories. Our experimental results demonstrate that DARA fine-tuned on LLMs (e.g. Llama-2-7B, Mistral) outperforms both in-context learning-based agents with GPT-4 and alternative fine-tuned agents, across different benchmarks in zero-shot evaluation, making such models more accessible for real-life applications. We also show that DARA attains performance comparable to state-of-the-art enumerating-and-ranking-based methods for KGQA.
- Asia > Indonesia > Sulawesi > North Sulawesi > Manado (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Japan (0.04)
- (12 more...)
- Leisure & Entertainment > Games (0.94)
- Government (0.94)
- Leisure & Entertainment > Sports > Football (0.67)
- Leisure & Entertainment > Sports > Hockey (0.51)
Examples of Initialization Techniques in Deep Learning
Initialization is a crucial step in deep learning that assigns initial values to a neural network's weights and biases before training. The choice of initialization technique can significantly impact a network's ability to learn and generalize. In this article, we'll explore the importance of initialization in deep learning and the common techniques used. To build a neural network using the three initialization methods described in the introduction, we will use the provided model() function. This function implements a three-layer neural network with a linear activation in the input layer, a rectified linear unit (ReLU) activation in the hidden layers, and a sigmoid activation in the output layer.
Python NumPy Tutorial - 2022
So you've learned the basics of Python and you're looking for a more powerful way to analyse data? NumPy is what you need.NumPy is a module for Python that allows you to work with multidimensional arrays and matrices. In addition, NumPy includes support for signal processing and linear algebra operations. So if you need to do any mathematical operations on your data, NumPy is probably the library for you. In this tutorial, we'll show you how to use NumPy to its full potential. You'll learn more about arrays as well as operate on them using mathematical functions. NumPy, which stands for Numerical Python, is a library consisting of multidimensional array objects and a collection of routines for processing those arrays. Using NumPy, mathematical and logical operations on arrays can be performed. In this Python Numpy Tutorial, we will be learning about NumPy in Python, What is NumPy in Python, Data Types in NumPy, and more. NumPy in Python is a library that is used to work with arrays and was created in 2005 by Travis Oliphant.
- Information Technology > Software > Programming Languages (0.96)
- Information Technology > Data Science (0.69)
- Information Technology > Artificial Intelligence (0.67)